Overview

Dataset statistics

Number of variables9
Number of observations20640
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 MiB
Average record size in memory72.0 B

Variable types

NUM9

Reproduction

Analysis started2020-08-25 00:02:41.017466
Analysis finished2020-08-25 00:02:55.109927
Duration14.09 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation

Variables

median_income
Real number (ℝ≥0)

Distinct count12928
Unique (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8706710030346416
Minimum0.4999000132083893
Maximum15.000100135803224
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:55.157454image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.4999000132
5-th percentile1.600570005
Q12.563399971
median3.534799933
Q34.743250132
95-th percentile7.300305104
Maximum15.00010014
Range14.50020012
Interquartile range (IQR)2.179850161

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249027
Kurtosis4.952524208
Mean3.870671003
Median Absolute Deviation (MAD)1.064200044
Skewness1.646656713
Sum79890.6495
Variance3.609322562
2020-08-25T00:02:55.263178image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.125490.2%
 
15.00010014490.2%
 
2.875460.2%
 
2.625440.2%
 
4.125440.2%
 
3.875410.2%
 
3.375380.2%
 
3380.2%
 
4370.2%
 
3.625370.2%
 
4.375350.2%
 
2.125330.2%
 
2.375320.2%
 
4.625310.2%
 
3.5300.1%
 
2.25290.1%
 
4.875290.1%
 
3.25290.1%
 
1.625290.1%
 
3.75290.1%
 
2.5280.1%
 
4.25280.1%
 
3.6875260.1%
 
2.75250.1%
 
4.5240.1%
 
Other values (12903)1978095.8%
 
ValueCountFrequency (%) 
0.4999000132120.1%
 
0.536000013410< 0.1%
 
0.54949998861< 0.1%
 
0.64329999691< 0.1%
 
0.67750000951< 0.1%
 
0.68250000481< 0.1%
 
0.68309998511< 0.1%
 
0.695999981< 0.1%
 
0.69910001751< 0.1%
 
0.7006999851< 0.1%
 
ValueCountFrequency (%) 
15.00010014490.2%
 
152< 0.1%
 
14.900899891< 0.1%
 
14.583299641< 0.1%
 
14.42189981< 0.1%
 
14.411299711< 0.1%
 
14.295900341< 0.1%
 
14.286700251< 0.1%
 
13.946999551< 0.1%
 
13.855600361< 0.1%
 

housing_median_age
Real number (ℝ≥0)

Distinct count52
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.639486434108527
Minimum1.0
Maximum52.0
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:55.389328image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
2020-08-25T00:02:55.496563image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5212736.2%
 
368624.2%
 
358244.0%
 
167713.7%
 
176983.4%
 
346893.3%
 
266193.0%
 
336153.0%
 
185702.8%
 
255662.7%
 
325652.7%
 
375372.6%
 
155122.5%
 
195022.4%
 
274882.4%
 
244782.3%
 
304762.3%
 
284712.3%
 
204652.3%
 
294612.2%
 
314582.2%
 
234482.2%
 
214462.2%
 
144122.0%
 
223991.9%
 
Other values (27)603529.2%
 
ValueCountFrequency (%) 
14< 0.1%
 
2580.3%
 
3620.3%
 
41910.9%
 
52441.2%
 
61600.8%
 
71750.8%
 
82061.0%
 
92051.0%
 
102641.3%
 
ValueCountFrequency (%) 
5212736.2%
 
51480.2%
 
501360.7%
 
491340.6%
 
481770.9%
 
471981.0%
 
462451.2%
 
452941.4%
 
443561.7%
 
433531.7%
 

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5926
Unique (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.7630813953488
Minimum2.0
Maximum39320.0
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:55.611174image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
2020-08-25T00:02:55.721646image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1527180.1%
 
1613170.1%
 
1582170.1%
 
2127160.1%
 
1703150.1%
 
1471150.1%
 
2053150.1%
 
1722150.1%
 
1607150.1%
 
1717150.1%
 
1787140.1%
 
1705140.1%
 
1743140.1%
 
1650140.1%
 
1880140.1%
 
1731140.1%
 
1745140.1%
 
1724140.1%
 
1562140.1%
 
1808130.1%
 
1999130.1%
 
1551130.1%
 
1748130.1%
 
1649130.1%
 
1701130.1%
 
Other values (5901)2027898.2%
 
ValueCountFrequency (%) 
21< 0.1%
 
61< 0.1%
 
81< 0.1%
 
111< 0.1%
 
121< 0.1%
 
152< 0.1%
 
161< 0.1%
 
184< 0.1%
 
192< 0.1%
 
202< 0.1%
 
ValueCountFrequency (%) 
393201< 0.1%
 
379371< 0.1%
 
326271< 0.1%
 
320541< 0.1%
 
304501< 0.1%
 
304051< 0.1%
 
304011< 0.1%
 
282581< 0.1%
 
278701< 0.1%
 
277001< 0.1%
 

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1928
Unique (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean537.8980135658915
Minimum1.0
Maximum6445.0
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:56.024927image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile136
Q1295
median435
Q3647
95-th percentile1276.05
Maximum6445
Range6444
Interquartile range (IQR)352

Descriptive statistics

Standard deviation421.2479059
Coefficient of variation (CV)0.7831371288
Kurtosis21.92349542
Mean537.8980136
Median Absolute Deviation (MAD)163
Skewness3.453072752
Sum11102215
Variance177449.7983
2020-08-25T00:02:56.142297image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
280550.3%
 
331520.3%
 
343500.2%
 
345500.2%
 
393490.2%
 
394490.2%
 
309480.2%
 
348480.2%
 
328480.2%
 
314470.2%
 
272470.2%
 
399460.2%
 
291460.2%
 
388460.2%
 
295460.2%
 
317460.2%
 
313460.2%
 
322460.2%
 
346450.2%
 
287450.2%
 
365450.2%
 
290450.2%
 
284450.2%
 
340450.2%
 
390440.2%
 
Other values (1903)1946194.3%
 
ValueCountFrequency (%) 
11< 0.1%
 
22< 0.1%
 
35< 0.1%
 
47< 0.1%
 
56< 0.1%
 
65< 0.1%
 
76< 0.1%
 
88< 0.1%
 
97< 0.1%
 
108< 0.1%
 
ValueCountFrequency (%) 
64451< 0.1%
 
62101< 0.1%
 
54711< 0.1%
 
54191< 0.1%
 
52901< 0.1%
 
50331< 0.1%
 
50271< 0.1%
 
49571< 0.1%
 
49521< 0.1%
 
48191< 0.1%
 

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3888
Unique (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.4767441860465
Minimum3.0
Maximum35682.0
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:56.260857image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
2020-08-25T00:02:56.380454image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
891250.1%
 
761240.1%
 
1227240.1%
 
850240.1%
 
1052240.1%
 
825230.1%
 
999220.1%
 
782220.1%
 
1005220.1%
 
781210.1%
 
1098210.1%
 
753210.1%
 
872210.1%
 
1056200.1%
 
1158200.1%
 
899200.1%
 
837200.1%
 
804200.1%
 
1011200.1%
 
926200.1%
 
1155200.1%
 
1203200.1%
 
1047200.1%
 
986200.1%
 
861200.1%
 
Other values (3863)2010697.4%
 
ValueCountFrequency (%) 
31< 0.1%
 
51< 0.1%
 
61< 0.1%
 
84< 0.1%
 
92< 0.1%
 
111< 0.1%
 
134< 0.1%
 
143< 0.1%
 
152< 0.1%
 
172< 0.1%
 
ValueCountFrequency (%) 
356821< 0.1%
 
285661< 0.1%
 
163051< 0.1%
 
161221< 0.1%
 
155071< 0.1%
 
150371< 0.1%
 
132511< 0.1%
 
128731< 0.1%
 
124271< 0.1%
 
122031< 0.1%
 

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1815
Unique (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802325581
Minimum1.0
Maximum6082.0
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:56.533444image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
2020-08-25T00:02:56.691624image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
306570.3%
 
386560.3%
 
335560.3%
 
282550.3%
 
429540.3%
 
375530.3%
 
284510.2%
 
297510.2%
 
362500.2%
 
380500.2%
 
278500.2%
 
340500.2%
 
316490.2%
 
329490.2%
 
319490.2%
 
330490.2%
 
377480.2%
 
309480.2%
 
426480.2%
 
341480.2%
 
357470.2%
 
352460.2%
 
363460.2%
 
410460.2%
 
269460.2%
 
Other values (1790)1938893.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
23< 0.1%
 
34< 0.1%
 
44< 0.1%
 
57< 0.1%
 
65< 0.1%
 
710< 0.1%
 
88< 0.1%
 
99< 0.1%
 
107< 0.1%
 
ValueCountFrequency (%) 
60821< 0.1%
 
53581< 0.1%
 
51891< 0.1%
 
50501< 0.1%
 
49301< 0.1%
 
48551< 0.1%
 
47691< 0.1%
 
46161< 0.1%
 
44901< 0.1%
 
43721< 0.1%
 

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count862
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143109965
Minimum32.540000915527344
Maximum41.95000076293945
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:56.838993image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum32.54000092
5-th percentile32.81999969
Q133.93000031
median34.25999832
Q337.70999908
95-th percentile38.95999908
Maximum41.95000076
Range9.409999847
Interquartile range (IQR)3.779998779

Descriptive statistics

Standard deviation2.135952381
Coefficient of variation (CV)0.05994501255
Kurtosis-1.11775977
Mean35.63186143
Median Absolute Deviation (MAD)1.229999542
Skewness0.4659530068
Sum735441.6199
Variance4.562292572
2020-08-25T00:02:56.932615image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
34.060001372441.2%
 
34.049999242361.1%
 
34.080001832341.1%
 
34.069999692311.1%
 
34.040000922211.1%
 
34.090000152121.0%
 
34.020000462081.0%
 
34.099998472031.0%
 
34.029998781930.9%
 
33.930000311810.9%
 
33.939998631750.8%
 
33.970001221720.8%
 
33.990001681680.8%
 
33.880001071640.8%
 
33.979999541620.8%
 
34.110000611620.8%
 
34.159999851590.8%
 
34.119998931580.8%
 
34.150001531570.8%
 
34.009998321560.8%
 
33.889999391540.7%
 
34.169998171540.7%
 
34.139999391520.7%
 
341520.7%
 
33.900001531520.7%
 
Other values (837)1608077.9%
 
ValueCountFrequency (%) 
32.540000921< 0.1%
 
32.549999243< 0.1%
 
32.5600013710< 0.1%
 
32.56999969180.1%
 
32.58000183260.1%
 
32.59000015110.1%
 
32.599998479< 0.1%
 
32.61000061140.1%
 
32.61999893130.1%
 
32.63000107180.1%
 
ValueCountFrequency (%) 
41.950000762< 0.1%
 
41.919998171< 0.1%
 
41.880001071< 0.1%
 
41.860000613< 0.1%
 
41.840000151< 0.1%
 
41.819999691< 0.1%
 
41.810001372< 0.1%
 
41.799999243< 0.1%
 
41.790000921< 0.1%
 
41.779998783< 0.1%
 

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct count844
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.56970444871473
Minimum-124.3499984741211
Maximum-114.30999755859376
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:57.037260image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-124.3499985
5-th percentile-122.4700012
Q1-121.8000031
median-118.4899979
Q3-118.0100021
95-th percentile-117.0800018
Maximum-114.3099976
Range10.04000092
Interquartile range (IQR)3.790000916

Descriptive statistics

Standard deviation2.003531743
Coefficient of variation (CV)-0.01675618211
Kurtosis-1.330152327
Mean-119.5697044
Median Absolute Deviation (MAD)1.279998779
Skewness-0.297801235
Sum-2467918.7
Variance4.014139445
2020-08-25T00:02:57.137041image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-118.30999761620.8%
 
-118.30000311600.8%
 
-118.29000091480.7%
 
-118.26999661440.7%
 
-118.31999971420.7%
 
-118.27999881410.7%
 
-118.34999851400.7%
 
-118.36000061380.7%
 
-118.19000241350.7%
 
-118.251280.6%
 
-118.37000271280.6%
 
-118.19999691260.6%
 
-118.13999941250.6%
 
-118.26000211210.6%
 
-118.12999731210.6%
 
-118.18000031200.6%
 
-118.33999631190.6%
 
-118.20999911180.6%
 
-118.15000151160.6%
 
-118.12000271120.5%
 
-118.09999851090.5%
 
-118.37999731070.5%
 
-118.43000031060.5%
 
-118.16999821060.5%
 
-118.16000371030.5%
 
Other values (819)1746584.6%
 
ValueCountFrequency (%) 
-124.34999851< 0.1%
 
-124.30000312< 0.1%
 
-124.26999661< 0.1%
 
-124.26000211< 0.1%
 
-124.251< 0.1%
 
-124.23000343< 0.1%
 
-124.22000121< 0.1%
 
-124.20999913< 0.1%
 
-124.19000244< 0.1%
 
-124.18000036< 0.1%
 
ValueCountFrequency (%) 
-114.30999761< 0.1%
 
-114.47000121< 0.1%
 
-114.48999791< 0.1%
 
-114.55000311< 0.1%
 
-114.55999761< 0.1%
 
-114.56999973< 0.1%
 
-114.58000182< 0.1%
 
-114.58999632< 0.1%
 
-114.59999853< 0.1%
 
-114.61000063< 0.1%
 

target
Real number (ℝ≥0)

Distinct count3842
Unique (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.81690891474
Minimum14999.0
Maximum500001.0
Zeros0
Zeros (%)0.0%
Memory size161.4 KiB
2020-08-25T00:02:57.253968image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816e+10
2020-08-25T00:02:57.361576image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5000019654.7%
 
1375001220.6%
 
1625001170.6%
 
1125001030.5%
 
187500930.5%
 
225000920.4%
 
350000790.4%
 
87500780.4%
 
275000650.3%
 
150000640.3%
 
175000630.3%
 
100000620.3%
 
125000560.3%
 
67500550.3%
 
250000470.2%
 
200000460.2%
 
118800390.2%
 
450000370.2%
 
156300350.2%
 
212500330.2%
 
193800310.2%
 
181300310.2%
 
300000300.1%
 
75000300.1%
 
81300290.1%
 
Other values (3817)1823888.4%
 
ValueCountFrequency (%) 
149994< 0.1%
 
175001< 0.1%
 
225004< 0.1%
 
250001< 0.1%
 
266001< 0.1%
 
269001< 0.1%
 
275001< 0.1%
 
283001< 0.1%
 
300002< 0.1%
 
325004< 0.1%
 
ValueCountFrequency (%) 
5000019654.7%
 
500000270.1%
 
4991001< 0.1%
 
4990001< 0.1%
 
4988001< 0.1%
 
4987001< 0.1%
 
4986001< 0.1%
 
4984001< 0.1%
 
4976001< 0.1%
 
4974001< 0.1%
 

Interactions

2020-08-25T00:02:41.799662image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:41.959006image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:42.112848image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:42.277632image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:42.433789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:42.755454image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:42.909444image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.054716image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.206873image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.357080image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.506073image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.661959image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.814878image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:43.962173image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:44.112784image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:44.258983image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:44.394544image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:44.537425image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:44.683749image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:44.848547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.003866image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.176918image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.340750image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.506021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.673486image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.833704image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:45.992297image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:46.150110image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:46.302502image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:46.449474image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:46.609902image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:46.763003image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:46.920010image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:47.265190image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:47.415835image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:47.564845image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:47.715415image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:47.880649image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.031747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.199578image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.357809image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.518079image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.677135image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.825672image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:48.985130image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:49.136721image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:49.291244image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:49.436936image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:49.594963image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:49.746316image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:49.906270image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.060136image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.201597image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.349654image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.496605image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.648442image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.788624image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:50.937021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:51.077807image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:51.222497image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:51.362966image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:51.493098image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:51.795841image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:51.934947image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.085707image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.239093image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.399297image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.546774image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.699534image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.847700image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:52.997016image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:53.141365image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:53.287371image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:53.434331image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:53.587375image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:53.741513image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:53.893131image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:54.041996image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:54.187819image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:54.325262image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:54.466708image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T00:02:57.487885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T00:02:57.709550image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T00:02:57.932670image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T00:02:58.156930image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T00:02:54.724235image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:54.986459image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

median_incomehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdslatitudelongitudetarget
08.325241.0880.0129.0322.0126.037.880001-122.230003452600.0
18.301421.07099.01106.02401.01138.037.860001-122.220001358500.0
27.257452.01467.0190.0496.0177.037.849998-122.239998352100.0
35.643152.01274.0235.0558.0219.037.849998-122.250000341300.0
43.846252.01627.0280.0565.0259.037.849998-122.250000342200.0
54.036852.0919.0213.0413.0193.037.849998-122.250000269700.0
63.659152.02535.0489.01094.0514.037.840000-122.250000299200.0
73.120052.03104.0687.01157.0647.037.840000-122.250000241400.0
82.080442.02555.0665.01206.0595.037.840000-122.260002226700.0
93.691252.03549.0707.01551.0714.037.840000-122.250000261100.0

Last rows

median_incomehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdslatitudelongitudetarget
206303.567311.02640.0505.01257.0445.039.290001-121.320000112000.0
206313.517915.02655.0493.01200.0432.039.330002-121.400002107200.0
206323.125015.02319.0416.01047.0385.039.259998-121.449997115600.0
206332.549527.02080.0412.01082.0382.039.189999-121.52999998300.0
206343.712528.02332.0395.01041.0344.039.270000-121.559998116800.0
206351.560325.01665.0374.0845.0330.039.480000-121.08999678100.0
206362.556818.0697.0150.0356.0114.039.490002-121.20999977100.0
206371.700017.02254.0485.01007.0433.039.430000-121.22000192300.0
206381.867218.01860.0409.0741.0349.039.430000-121.32000084700.0
206392.388616.02785.0616.01387.0530.039.369999-121.23999889400.0